Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce

نویسندگان

  • Erik B. Reed
  • Ole J. Mengshoel
چکیده

Bayesian network (BN) parameter learning from incomplete data can be a computationally expensive task for incomplete data. Applying the EM algorithm to learn BN parameters is unfortunately susceptible to local optima and prone to premature convergence. We develop and experiment with two methods for improving EM parameter learning by using MapReduce: Age-Layered Expectation Maximization (ALEM) and Multiple Expectation Maximization (MEM). Leveraging MapReduce for distributed machine learning, these algorithms (i) operate on a (potentially large) population of BNs and (ii) partition the data set as is traditionally done with MapReduce machine learning. For example, we achieved gains using the Hadoop implementation of MapReduce in both parameter quality (likelihood) and number of iterations (runtime) using distributed ALEM on for the BN Asia over 20,000 MEM and ALEM trials.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MapReduce for Bayesian Network Parameter Learning using the EM Algorithm

This work applies the distributed computing framework MapReduce to Bayesian network parameter learning from incomplete data. We formulate the classical Expectation Maximization (EM) algorithm within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present details of the MapReduce formulation of EM, report speed-ups v...

متن کامل

Bayesian Network Parameter Learning using EM with Parameter Sharing

This paper explores the e↵ects of parameter sharing on Bayesian network (BN) parameter learning when there is incomplete data. Using the Expectation Maximization (EM) algorithm, we investigate how varying degrees of parameter sharing, varying number of hidden nodes, and di↵erent dataset sizes impact EM performance. The specific metrics of EM performance examined are: likelihood, error, and the ...

متن کامل

Large-Scale Online Expectation Maximization with Spark Streaming

Many “Big Data” applications in Machine Learning (ML) need to react quickly to large streams of incoming data. The standard paradigm nowadays is to run ML algorithms on frameworks designed for batch operations, such as MapReduce or Hadoop. By design, these frameworks are not a good match for low-latency applications. This is why we explore using a new, recently proposed model for large-scale st...

متن کامل

A Genetic Algorithm for Learning Parameters in Bayesian Networks using Expectation Maximization

Expectation maximization (EM) is a popular algorithm for parameter estimation in situations with incomplete data. The EM algorithm has, despite its popularity, the disadvantage of often converging to local but non-global optima. Several techniques have been proposed to address this problem, for example initializing EM from multiple random starting points and then selecting the run with the high...

متن کامل

Navigating the parameter space of Bayesian Knowledge Tracing models: Visualizations of the convergence of the Expectation Maximization algorithm

Bayesian Knowledge Tracing (KT) models are employed by the cognitive tutors in order to determine student knowledge based on four parameters: learn rate, prior, guess and slip. A commonly used algorithm for learning these parameter values from data is the Expectation Maximization (EM) algorithm. Past work, however, has suggested that with four free parameters the standard KT model is prone to c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012